Biostatistics For Dummies (Monika Wahi John Pezzullo)

regardless of what kind of analysis you use.

Dealing with missing data

Most clinical trials have incomplete data for one or more variables, which can be a real headache

when analyzing your data. The statistical aspects of missing data are quite complicated, so you should

consult a statistician if you have more than just occasional, isolated missing values. Here we describe

some commonly used approaches for coping with missing data:

Exclusion: Exclude a case from an analysis if any of the required variables for that analysis is

missing. This seems simple, but the downside to this approach is it can reduce the number of

analyzable cases, sometimes quite severely. And if the result is missing for a reason that’s related

to treatment efficacy, excluding the case can bias your results.

Imputation: Imputation is where you replace a missing value with a value you impute, or

create yourself. When analysts impute in a clinical trial, they typically take the mean or median of

all the available values for that variable and fill that in for the missing variable. In reality, you

have to keep the original variable, and then save a separate, imputed variable so that you can

document the type of imputation applied. There are a lot of downsides to imputation. If you are

imputing a small number of values, it’s not worth it because it adds bias. You may as well just

exclude those cases. But if you impute a large number of values, you are basically making up the

data yourself, adding more bias.

Last Observation Carried Forward (LOCF): LOCF is a special case of imputation. Sometimes

during follow-up, one of a series of sequential measurements on a particular participant is missing.

For example, imagine that there were supposed to be four weekly glucose values measured, and

you were missing a measurement only on week three. In that case, you could use the most recent

previous value in the series, which is the week two measurement, to impute the week three

measurement. This technique is called Last Observation Carried Forward (LOCF) and is one of

the most widely used strategies. Although imputation adds bias, LOCF adds bias in the

conservative direction, making it more difficult to demonstrate efficacy. This approach is popular

with regulators, who want to put the burden of proof on the drug and study sponsor.

Handling multiplicity

Every time you perform a statistical significance test, you run a chance of being fooled by random

fluctuations into thinking that some real effect is present in your data when, in fact, none exists (review

Chapter 3 for a refresher on statistical testing). If you declare the results of the test are statistically

significant, and in reality they are not, you are committing Type I error. When you say that you require

p < 0.05 to declare statistical significance, you’re testing at the 0.05 (or 5 percent) alpha (α) level.

This is another way of saying that you want to limit your Type I error rate to 5 percent. But that 5

percent error rate applies to each and every statistical test you run. The more analyses you perform on

a data set, the more your overall α level increases. If you perform two tests at α = 0.05, your chance of

at least one of them coming out falsely significant is about 10 percent. If you run 40 tests, the overall α

level jumps to 87 percent! This is referred to as the problem of multiplicity, or as Type I error

inflation.